tags:
- multimodal
- deep_learning
- classification
- models
aliases:
- Multi-modal LearningMulti-modal Learning with Deep Learning
Multimodal deep learning is the discipline of machine learning where the input consists of different modalities, ie. data with a different nature of interpretation such as sound and image .
Multi-modal learning with deep learning is a subfield of machine learning that focuses on training models to process and learn from multiple data sources (modalities). These modalities can be diverse and include:
The key objective is to leverage the complementary information present in these diverse data sources to create a richer understanding and improve performance on various tasks, such as:
In a late fusion approach with multi-modal learning, separate sub-models are pre-trained on individual data modalities before being combined for the final task. This pre-training offers several benefits:
There are various techniques for pre-training sub-models, depending on the specific data modalities and desired task:
Overall, pre-training sub-models in a late fusion approach can significantly benefit multi-modal learning with deep learning by leveraging existing knowledge, reducing training complexity, and improving feature representation for the final task.